Self-attention transfer networks for speech emotion recognition

نویسندگان

چکیده

A crucial element of human–machine interaction, the automatic detection emotional states from human speech has long been regarded as a challenging task for machine learning models. One vital challenge in emotion recognition (SER) is how to learn robust and discriminative representations speech. Meanwhile, although methods have widely applied SER research, inadequate amount available annotated data become bottleneck that impedes extended application techniques (e.g., deep neural networks). To address this issue, we present method combines knowledge transfer self-attention tasks. Here, apply log-Mel spectrogram with deltas delta-deltas input. Moreover, given emotions are time-dependent, Temporal Convolutional Neural Networks (TCNs) model variations emotions. We further introduce an attention mechanism, which based on algorithm order long-term dependencies. The Self-Attention Transfer Network (SATN) our proposed approach, takes advantage autoencoders source task, then recognition, followed by transferring into SER. Evaluation built Interactive Emotional Dyadic Motion Capture (IEMOCAP) demonstrates effectiveness novel model.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Feature Transfer Learning for Speech Emotion Recognition

Speech Emotion Recognition (SER) has achieved some substantial progress in the past few decades since the dawn of emotion and speech research. In many aspects, various research efforts have been made in an attempt to achieve human-like emotion recognition performance in real-life settings. However, with the availability of speech data obtained from different devices and varied acquisition condi...

متن کامل

Self-organizing boolean networks for speech recognition

We show the application of a self-organizing Booleari network to speech recognition. The model consists of a set of two-input Boolean gates which has to implement a n-to-1 Boolean mapping through a learning-by-example procedure. The training scheme is based on an optimization process (Simulated Annealing). This approach is applied to a simple phoneme recognition task, achieving high accuracy.

متن کامل

Neural Networks for Language Independent Emotion Recognition in Speech

This chapter introduces a neural network based approach for the identification of human affective state in speech signals. A group of potential features are first identified and extracted to represent the characteristics of different emotions. To reduce the dimensionality of the feature space, whilst increasing the discriminatory power of the features, a systematic feature selection approach wh...

متن کامل

Progressive Neural Networks for Transfer Learning in Emotion Recognition

Many paralinguistic tasks are closely related and thus representations learned in one domain can be leveraged for another. In this paper, we investigate how knowledge can be transferred between three paralinguistic tasks: speaker, emotion, and gender recognition. Further, we extend this problem to cross-dataset tasks, asking how knowledge captured in one emotion dataset can be transferred to an...

متن کامل

Speech Emotion Recognition Using Scalogram Based Deep Structure

Speech Emotion Recognition (SER) is an important part of speech-based Human-Computer Interface (HCI) applications. Previous SER methods rely on the extraction of features and training an appropriate classifier. However, most of those features can be affected by emotionally irrelevant factors such as gender, speaking styles and environment. Here, an SER method has been proposed based on a concat...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Virtual Reality & Intelligent Hardware

سال: 2021

ISSN: ['2096-5796', '2666-1209']

DOI: https://doi.org/10.1016/j.vrih.2020.12.002